{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "# API Authorization and Authentication\n",
        "\n",
        "[txtai](https://github.com/neuml/txtai) is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows. txtai can run in Python, with YAML configuration and through an API service.\n",
        "\n",
        "The default API service implementation runs without any security. This may be OK for a local prototype or if it's run on a small internal network. But in most cases, additional security measures should be taken.\n",
        "\n",
        "This notebook will demonstrate how to add authorization, authentication and middleware dependencies to a txtai API service."
      ],
      "metadata": {
        "id": "VGeVB8M41jqW"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Install dependencies\n",
        "\n",
        "Install `txtai` and all dependencies. Since this notebook uses the API, we need to install the api extras package."
      ],
      "metadata": {
        "id": "ZQrHIw351lwE"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "R0AqRP7v1hdr"
      },
      "outputs": [],
      "source": [
        "%%capture\n",
        "!pip install git+https://github.com/neuml/txtai#egg=txtai[api]"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Create an API Service\n",
        "\n",
        "For this example, we'll load an existing txtai index from the Hugging Face Hub."
      ],
      "metadata": {
        "id": "32xg8L1JHd3d"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "%%writefile config.yml\n",
        "cloud:\n",
        "  provider: huggingface-hub\n",
        "  container: neuml/txtai-intro\n",
        "\n",
        "embeddings:"
      ],
      "metadata": {
        "id": "XZ7vPBIs1rGZ",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "468b09cb-ba01-426d-9640-142acf1e3dd9"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Writing config.yml\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Next, we'll generate a test token to use for this notebook."
      ],
      "metadata": {
        "id": "Eu8rtjB_QZOg"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "import uuid\n",
        "str(uuid.uuid5(uuid.NAMESPACE_DNS, \"TokenTest\"))"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 38
        },
        "id": "Np18NtytQG8k",
        "outputId": "5bf60d69-311d-4a02-f3dd-d210aedc8b4f"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "'edd590d3-bfab-5425-8a85-79b01e3127ee'"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "string"
            }
          },
          "metadata": {},
          "execution_count": 3
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "txtai has a default API token authorization method built-in. We'll set a token and start the service.\n",
        "\n",
        "**It's important to note that this service is running via HTTP as this is only for demonstration purposes. HTTPS must be added either with a proxy service like NGINX or by passing a SSL cert to Uvicorn. See [this link](https://neuml.github.io/txtai/api/security/) for more.**"
      ],
      "metadata": {
        "id": "ehZSh97_Nr7h"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "!CONFIG=config.yml TOKEN=`echo -n 'edd590d3-bfab-5425-8a85-79b01e3127ee' | sha256sum | head -c 64` uvicorn \"txtai.api:app\" &> api.log &\n",
        "!sleep 60"
      ],
      "metadata": {
        "id": "PjKT3vOuNkfX"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Connect to Service\n",
        "\n",
        "First, we'll try a request with no token to see what happens."
      ],
      "metadata": {
        "id": "NfOn1vgwRF2C"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "!curl -X GET -I 'http://localhost:8000/search?query=feel+good+story&limit=1'"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "9Ljn13HWOb26",
        "outputId": "4e437ebb-cf92-4e7d-c804-25bb42f6cb30"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "HTTP/1.1 401 Unauthorized\r\n",
            "\u001b[1mdate\u001b[0m: Thu, 04 Jan 2024 15:08:38 GMT\r\n",
            "\u001b[1mserver\u001b[0m: uvicorn\r\n",
            "\u001b[1mcontent-length\u001b[0m: 40\r\n",
            "\u001b[1mcontent-type\u001b[0m: application/json\r\n",
            "\r\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "As expected, we received a HTTP 401 saying the request is not authorized.\n",
        "\n",
        "Now let's try an invalid token."
      ],
      "metadata": {
        "id": "Rvi4PRSTRTjt"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "!curl -X GET -I 'http://localhost:8000/search?query=feel+good+story&limit=1' -H 'Authorization: Bearer junk'"
      ],
      "metadata": {
        "id": "vJvQHtcJdRXb",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "099577d1-2ed0-4051-827d-2ab72d5929fa"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "HTTP/1.1 401 Unauthorized\r\n",
            "\u001b[1mdate\u001b[0m: Thu, 04 Jan 2024 15:08:38 GMT\r\n",
            "\u001b[1mserver\u001b[0m: uvicorn\r\n",
            "\u001b[1mcontent-length\u001b[0m: 40\r\n",
            "\u001b[1mcontent-type\u001b[0m: application/json\r\n",
            "\r\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Once again, the request is rejected.\n",
        "\n",
        "Let's try again, this time passing a valid API token."
      ],
      "metadata": {
        "id": "7oPf-Crb6ETH"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "!curl -X GET 'http://localhost:8000/search?query=feel+good+story&limit=1' -H 'Authorization: Bearer edd590d3-bfab-5425-8a85-79b01e3127ee'"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "HNxzVN1CSE1C",
        "outputId": "77962561-52fa-4697-9c0c-bc2d630ea8b4"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "[{\"id\":\"4\",\"text\":\"Maine man wins $1M from $25 lottery ticket\",\"score\":0.08329025655984879}]"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "This time we get search results!"
      ],
      "metadata": {
        "id": "H025YmZ4TSSS"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Dependencies\n",
        "\n",
        "Next, let's add a custom dependency to test out authentication. A dependency could integrate with external identity providers to validate user credentials such as OAuth, Active Directory, LDAP or another identity management service.\n",
        "\n",
        "For this simple example, we'll validate user credentials using basic HTTP authentication. The code below checks if a specific username and password are provided. It's based on [this FastAPI example](https://fastapi.tiangolo.com/advanced/security/http-basic-auth/)."
      ],
      "metadata": {
        "id": "3JjhlfhsUDNU"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "%%writefile authentication.py\n",
        "\n",
        "import secrets\n",
        "\n",
        "from fastapi import Depends, HTTPException, status\n",
        "from fastapi.security import HTTPBasic, HTTPBasicCredentials\n",
        "\n",
        "security = HTTPBasic()\n",
        "\n",
        "\n",
        "class Authentication:\n",
        "    def __call__(self, credentials: HTTPBasicCredentials = Depends(security)):\n",
        "        user = credentials.username.encode(\"utf8\")\n",
        "        validuser = secrets.compare_digest(user, b\"txtai\")\n",
        "\n",
        "        password = credentials.password.encode(\"utf8\")\n",
        "        validpassword = secrets.compare_digest(password, b\"theembeddingsdb\")\n",
        "\n",
        "        if not (validuser and validpassword):\n",
        "            raise HTTPException(\n",
        "                status_code=status.HTTP_401_UNAUTHORIZED,\n",
        "                detail=\"Incorrect user or password\",\n",
        "                headers={\"WWW-Authenticate\": \"Basic\"},\n",
        "            )\n",
        "\n",
        "        return credentials.username"
      ],
      "metadata": {
        "id": "1W9Vw0PMUa83",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "8bece467-588d-4d80-98a0-2451d7ebf85e"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Writing authentication.py\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Now let's restart the application and add this dependency in.\n",
        "\n",
        "**Once again, same note as above, this demonstration uses HTTP. Real use-cases must use HTTPS.**"
      ],
      "metadata": {
        "id": "VNYHD20xhVAS"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "!killall -9 uvicorn\n",
        "!CONFIG=config.yml DEPENDENCIES=authentication.Authentication uvicorn \"txtai.api:app\" &> api.log &\n",
        "!sleep 30"
      ],
      "metadata": {
        "id": "F60-Aakyg1x_"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "!curl -X GET -I 'http://localhost:8000/search?query=feel+good+story&limit=1'"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "jda9RugAhcNT",
        "outputId": "89111cdc-bee4-4cec-83ce-de3a4bf8e21a"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "HTTP/1.1 401 Unauthorized\r\n",
            "\u001b[1mdate\u001b[0m: Thu, 04 Jan 2024 15:09:10 GMT\r\n",
            "\u001b[1mserver\u001b[0m: uvicorn\r\n",
            "\u001b[1mwww-authenticate\u001b[0m: Basic\r\n",
            "\u001b[1mcontent-length\u001b[0m: 30\r\n",
            "\u001b[1mcontent-type\u001b[0m: application/json\r\n",
            "\r\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "The request is rejected as expected. Next let's try an invalid username/password."
      ],
      "metadata": {
        "id": "XZCsCDOG6VBS"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "!curl -X GET -I 'http://localhost:8000/search?query=feel+good+story&limit=1' -H \"Authorization: Basic junk\""
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "VFj1UPuC6c4P",
        "outputId": "4fa7d6df-c06a-4cfe-88ff-4c341783704d"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "HTTP/1.1 401 Unauthorized\r\n",
            "\u001b[1mdate\u001b[0m: Thu, 04 Jan 2024 15:09:10 GMT\r\n",
            "\u001b[1mserver\u001b[0m: uvicorn\r\n",
            "\u001b[1mwww-authenticate\u001b[0m: Basic\r\n",
            "\u001b[1mcontent-length\u001b[0m: 47\r\n",
            "\u001b[1mcontent-type\u001b[0m: application/json\r\n",
            "\r\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Once again, the request is rejected.\n",
        "\n",
        "Now we'll add the expected username/password to the request. [HTTP basic authentication](https://en.wikipedia.org/wiki/Basic_access_authentication) simply concats the username-password separated by a colon and then base64 encodes it."
      ],
      "metadata": {
        "id": "jwZYyTDKh5pB"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "!echo -n txtai:theembeddingsdb | base64"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "uL943ry3h2hR",
        "outputId": "aa2bc9a6-cd0a-42aa-933c-3cb5aa579588"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "dHh0YWk6dGhlZW1iZWRkaW5nc2Ri\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "!curl -X GET 'http://localhost:8000/search?query=feel+good+story&limit=1' -H \"Authorization: Basic dHh0YWk6dGhlZW1iZWRkaW5nc2Ri\""
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "X9vfyRsphxiA",
        "outputId": "01cfe7b9-80ee-4745-eae9-7db46b2ec554"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "[{\"id\":\"4\",\"text\":\"Maine man wins $1M from $25 lottery ticket\",\"score\":0.08329025655984879}]"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Now that we have the correct username/password, a response is returned!"
      ],
      "metadata": {
        "id": "M-NSiDGuiigq"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Wrapping up\n",
        "\n",
        "This notebook introduced how to add authorization, authentication and middleware dependencies to a txtai API service. As noted multiple times, ensure that HTTPS is enabled when using this in production environments.\n",
        "\n",
        "For more advanced authentication methods, check out the [FastAPI security documentation](https://fastapi.tiangolo.com/tutorial/security/).\n"
      ],
      "metadata": {
        "id": "ZivWat7-r3ID"
      }
    }
  ]
}